Feature Selection and Case Selection Methods Based on Mutual Information in Software Cost Estimation

نویسندگان

  • Shihai Shi
  • Qin Liu
چکیده

Software cost estimation is one of the most crucial processes in software development management because it involves many management activities such as project planning, resource allocation and risk assessment. Accurate software cost estimation not only does help to make investment and bid plan but also enable the project to be completed in the limited cost and time. The research interest of this master thesis will focus on feature selection method and case selection method and the goal is to improve the accuracy of software cost estimation model. Case based reasoning in software cost estimation is an immediate area of research focus. It can predict the cost of new software project via constructing estimation model using historical software projects. In order to construct estimation model, case based reasoning in software cost estimation needs to pick out relatively independent candidate features which are relevant to the estimated feature. However, many sequential search feature selection methods used currently are not able to obtain the redundancy value of candidate features precisely. Besides, when using local distance of candidate features to calculate the global distance of two software projects in case selection, the different impact of each candidate feature is unproven. To solve these two problems, this thesis explores the solutions with the help from NSFC. In this thesis, a feature selection algorithm based on hierarchical clustering is proposed. It gathers similar candidate features into the same clustering and selects one feature that is most similar to the estimated feature as the representative feature. These representative features form the candidate feature subsets. Evaluation metrics are applied to these candidate feature subsets and the one that can produce best performance will be marked as the final result of feature selection. The experiment result shows that the proposed algorithm improves 12.6% and 3.75% in PRED (0.25) over other sequential search feature selection methods on ISBSG dataset and Desharnais dataset, respectively. Meanwhile, this thesis defines candidate feature weight using symmetric uncertainty which origins from information theory. The feature weight is capable of reflecting the impact of each feature with the estimated feature. The experiment result demonstrates that by applying feature weight, the performance of estimation model improves 8.9% than that without feature weight in PRED (0.25) value. This thesis discusses and analyzes the drawback of proposed ideas as well as mentions some improvement directions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A study of mutual information based feature selection for case based reasoning in software cost estimation

Software cost estimation is one of the most crucial activities in software development process. In the past decades, many methods have been proposed for cost estimation. Case Based Reasoning (CBR) is one of these techniques. Feature selection is an important preprocessing stage of case based reasoning. Most existing feature selection methods of case-based reasoning are ‘wrappers’ which can usua...

متن کامل

Bridging the semantic gap for software effort estimation by hierarchical feature selection techniques

Software project management is one of the significant activates in the software development process. Software Development Effort Estimation (SDEE) is a challenging task in the software project management. SDEE is an old activity in computer industry from 1940s and has been reviewed several times. A SDEE model is appropriate if it provides the accuracy and confidence simultaneously before softwa...

متن کامل

Improvement of effort estimation accuracy in software projects using a feature selection approach

In recent years, utilization of feature selection techniques has become an essential requirement for processing and model construction in different scientific areas. In the field of software project effort estimation, the need to apply dimensionality reduction and feature selection methods has become an inevitable demand. The high volumes of data, costs, and time necessary for gathering data , ...

متن کامل

Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine

Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods.  In filter methods, features subsets are selected due to some measu...

متن کامل

Mental Arithmetic Task Recognition Using Effective Connectivity and Hierarchical Feature Selection From EEG Signals

Introduction: Mental arithmetic analysis based on Electroencephalogram (EEG) signal for monitoring the state of the user’s brain functioning can be helpful for understanding some psychological disorders such as attention deficit hyperactivity disorder, autism spectrum disorder, or dyscalculia where the difficulty in learning or understanding the arithmetic exists. Most mental arithmetic recogni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014